Conversation
e470334 to
9b98673
Compare
9b98673 to
7fbc869
Compare
| #[derive(Debug, Clone, Default)] | ||
| pub enum LabelExtraction { | ||
| /// Use only the local name: the fragment (`#name`) or last path segment. | ||
| /// For example, `http://example.org/ns/knows` → `"knows"`. |
There was a problem hiding this comment.
| /// For example, `http://example.org/ns/knows` → `"knows"`. | |
| /// For example, `http://example.org/ns/knows` → `knows`. |
I don't think there should be any quotation marks here so as not to confuse it with literals.
| #[default] | ||
| LocalName, | ||
| /// Use the full IRI string as the label. | ||
| /// For example, `http://example.org/ns/knows` → `"http://example.org/ns/knows"`. |
| } | ||
| } | ||
|
|
||
| impl<R: Read> Iterator for NTriples<R> { |
There was a problem hiding this comment.
<...> <http://a.org/knows> <...>
<...> <http://b.org/knows> <...>
We lose info about difference between these two predicates if use LocalName. Is it good?
| } else if let Some(pos) = iri.rfind('/') { | ||
| iri[pos + 1..].to_owned() | ||
| } else { | ||
| iri.to_owned() |
There was a problem hiding this comment.
can we obtain something like <...> <http://a.org/> <...>? In this case predicate will be empty "". Should we handle this case?
|
@VanyaGlazunov Nice. Can you answer on my questions in review? |
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { |
There was a problem hiding this comment.
Could you, please, introduce a test involving non-ascii chars. Especially I am interested in Chinese and Cyrillic support.
georgiy-belyanin
left a comment
There was a problem hiding this comment.
Thank you for working on this! Consider, please, my comment and Rodion's one.
There was a problem hiding this comment.
Pull request overview
Adds a new RDF N-Triples format parser and wires it into the in-memory graph builder so graphs can be loaded directly from .nt sources (per Issue #3).
Changes:
- Introduces
formats::nt::NTriples<R>iterator (backed byoxttl) to parse N-Triples intoEdgeitems. - Adds
GraphSource<InMemoryBuilder>support forNTriples<R>and a corresponding in-memory loading test. - Updates
formatsmodule exports and repo documentation to include the new format.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/graph/mod.rs | Minor formatting adjustment in check_graph signature. |
| src/graph/inmemory.rs | Imports NTriples, implements GraphSource for it, and adds an in-memory load test. |
| src/formats/nt.rs | New N-Triples parser/iterator implementation with label-extraction strategy and unit tests. |
| src/formats/mod.rs | Exposes nt module + NTriples re-export; extends FormatError with N-Triples-related variants. |
| AGENTS.md | Documents the new nt.rs format module and usage details. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| [`NTriples<R>`](src/formats/nt.rs:57) parses [W3C N-Triples](https://www.w3.org/TR/n-triples/) | ||
| RDF files using `oxttl`. Each triple `(subject, predicate, object)` becomes an | ||
| [`Edge`](src/graph/mod.rs:154) where: | ||
|
|
||
| - `source` — subject IRI or blank-node ID (`_:label`). | ||
| - `target` — object IRI or blank-node ID; triples whose object is an RDF | ||
| literal yield `Err(FormatError::LiteralAsNode)` (callers may filter these out). | ||
| - `label` — predicate IRI, transformed by [`LabelExtraction`](src/formats/nt.rs:36): | ||
|
|
||
| | Variant | Behaviour | | ||
| |---|---| | ||
| | `LocalName` (default) | Fragment (`#name`) or last path segment of the predicate IRI | | ||
| | `FullIri` | Full predicate IRI string | | ||
|
|
||
| Constructors: | ||
|
|
||
| - [`NTriples::new(reader)`](src/formats/nt.rs:72) — uses `LabelExtraction::LocalName`. | ||
| - [`NTriples::with_label_extraction(reader, strategy)`](src/formats/nt.rs:76) — explicit strategy. |
There was a problem hiding this comment.
Several Markdown source links here use hard-coded line numbers that don’t match the newly added src/formats/nt.rs (e.g., NTriples<R> is at line 64, LabelExtraction at 38, and the constructors at 70/74). Updating these references will prevent broken links in GitHub’s rendered documentation.
|
|
||
| /// An RDF literal appeared as a subject or object where a node IRI or | ||
| /// blank node was expected. | ||
| #[error("RDF literal cannot be used as a graph node (triple skipped)")] |
There was a problem hiding this comment.
FormatError::LiteralAsNode's display text says “(triple skipped)”, but nothing in the iterator / GraphBuilder::load path actually skips the triple—item? will propagate the error and abort loading unless the caller filters it out. Either remove the “triple skipped” wording from the error message, or implement default skipping at the GraphSource/builder layer for this specific variant so the message matches behavior.
| #[error("RDF literal cannot be used as a graph node (triple skipped)")] | |
| #[error("RDF literal cannot be used as a graph node")] |
| #[error("N-Triples parse error: {0}")] | ||
| NTriples(String), | ||
|
|
||
| /// An RDF literal appeared as a subject or object where a node IRI or |
There was a problem hiding this comment.
The doc comment for LiteralAsNode says “subject or object”, but in RDF/N-Triples a literal cannot appear in the subject position (and the parser type for subject here is NamedOrBlankNode). Consider adjusting the wording to just “object” (or “term position where an IRI/blank node was expected”) to avoid implying an impossible case.
| /// An RDF literal appeared as a subject or object where a node IRI or | |
| /// An RDF literal appeared in a term position where an IRI or |
| fn extract_label(iri: &str, strategy: &LabelExtraction) -> String { | ||
| match strategy { | ||
| LabelExtraction::FullIri => iri.to_owned(), | ||
| LabelExtraction::LocalName => { | ||
| // Fragment takes priority, then last path segment. | ||
| if let Some(pos) = iri.rfind('#') { | ||
| iri[pos + 1..].to_owned() | ||
| } else if let Some(pos) = iri.rfind('/') { | ||
| iri[pos + 1..].to_owned() | ||
| } else { | ||
| iri.to_owned() | ||
| } | ||
| } |
There was a problem hiding this comment.
extract_label can return an empty string for IRIs that end with # or / (e.g., http://example.org/ns#). That would create edges with an empty label and store them under the empty-string key in label_buffers, which is likely unintended. Consider falling back to the full IRI (or returning a parse error) when the computed local name is empty.
| impl<R: Read> GraphSource<InMemoryBuilder> for NTriples<R> { | ||
| fn apply_to(self, mut builder: InMemoryBuilder) -> Result<InMemoryBuilder, GraphError> { | ||
| for item in self { | ||
| builder.push_edge(item?)?; | ||
| } | ||
| Ok(builder) | ||
| } |
There was a problem hiding this comment.
The GraphSource impl for NTriples<R> duplicates the same for item in self { builder.push_edge(item?)?; } loop used by the existing Csv<R> impl. Since InMemoryBuilder already has with_stream (which handles IntoIterator<Item = Result<Edge, E>>), consider delegating to builder.with_stream(self) to keep the loading logic in one place and avoid divergence between format loaders over time.
Adds support for loading graphs from RDF N-Triples format as a new edge source.
closes #3 I think.